##  [1] "Census Tract"                "Total Population"           
##  [3] "California County"           "ZIP"                        
##  [5] "Approximate Location"        "Longitude"                  
##  [7] "Latitude"                    "CES 4.0 Score"              
##  [9] "CES 4.0 Percentile"          "CES 4.0 Percentile Range"   
## [11] "Ozone"                       "Ozone Pctl"                 
## [13] "PM2.5"                       "PM2.5 Pctl"                 
## [15] "Diesel PM"                   "Diesel PM Pctl"             
## [17] "Drinking Water"              "Drinking Water Pctl"        
## [19] "Lead"                        "Lead Pctl"                  
## [21] "Pesticides"                  "Pesticides Pctl"            
## [23] "Tox. Release"                "Tox. Release Pctl"          
## [25] "Traffic"                     "Traffic Pctl"               
## [27] "Cleanup Sites"               "Cleanup Sites Pctl"         
## [29] "Groundwater Threats"         "Groundwater Threats Pctl"   
## [31] "Haz. Waste"                  "Haz. Waste Pctl"            
## [33] "Imp. Water Bodies"           "Imp. Water Bodies Pctl"     
## [35] "Solid Waste"                 "Solid Waste Pctl"           
## [37] "Pollution Burden"            "Pollution Burden Score"     
## [39] "Pollution Burden Pctl"       "Asthma"                     
## [41] "Asthma Pctl"                 "Low Birth Weight"           
## [43] "Low Birth Weight Pctl"       "Cardiovascular Disease"     
## [45] "Cardiovascular Disease Pctl" "Education"                  
## [47] "Education Pctl"              "Linguistic Isolation"       
## [49] "Linguistic Isolation Pctl"   "Poverty"                    
## [51] "Poverty Pctl"                "Unemployment"               
## [53] "Unemployment Pctl"           "Housing Burden"             
## [55] "Housing Burden Pctl"         "Pop. Char."                 
## [57] "Pop. Char. Score"            "Pop. Char. Pctl"

LOAD BAY PM 2.5 DATA

The areas of darker red are areas with higher concentration of PM 2.5, the lighter the area, the less concentration of PM 2.5 There is a lot of PM 2.5 located in the East Bay by Oakland, Richmond, and Vallejo and a lot in the South Bay by San Mateo, Fremont, Newark, and Mountain View.

LOAD AND MAP ASTHMA DATA

The areas of darker red are areas with higher concentration of asthma, the lighter the area, the less concentration of people with asthma. There is a lot of asthma located in the East Bay by Oakland, Richmond, and Vallejo.

Clean the Data and Create the Scatter Plot with PM2.5 on the x-axis and Asthma on the y-axis At this stage, the best fit line is not very clear with the points. But that is because there are several outliers and the points themselves are not balanced.

Coefficients

## 
## Call:
## lm(formula = Asthma ~ PM2.5, data = bay_asthma_pm_tract)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -54.47 -25.89  -9.61  12.94 182.95 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -116.278     13.040  -8.917   <2e-16 ***
## PM2.5         19.862      1.534  12.950   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 37.49 on 1578 degrees of freedom
## Multiple R-squared:  0.09606,    Adjusted R-squared:  0.09549 
## F-statistic: 167.7 on 1 and 1578 DF,  p-value: < 2.2e-16

An increase of PM 2.5 by one unit appears to be associated with an increase of Asthma by 19.862; With a p-value in the 0.01 to 0.05 range. This also conceptually makes sense and is a big part of what EJ activists are fighting about when it comes to Cap and Trade not covering local air pollution.

Plot Residuals Ideally our residual plots look like a bell curve. This curve is skewed to the left and unbalanced.

Run log on and Create the Scatter Plot with PM2.5 on the x-axis and Asthma on the y-axis Coefficients

## 
## Call:
## lm(formula = log(Asthma) ~ PM2.5, data = bay_asthma_pm_tract)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.00402 -0.46479  0.03313  0.42298  1.75525 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.69234    0.22840   3.031  0.00248 ** 
## PM2.5        0.35633    0.02686  13.264  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6566 on 1578 degrees of freedom
## Multiple R-squared:  0.1003, Adjusted R-squared:  0.09974 
## F-statistic: 175.9 on 1 and 1578 DF,  p-value: < 2.2e-16

This distribution looks way better, though the dip at what should be the peak of the bell curve is a bit confusing. This realtionship between asthma and PM 2.5 and residuals looks more normal though.

Add Residuals As a Column to the Spatial Dataset

Explain what a low residual means in the context of Asthma estimation (i.e., under- or over-estimation), and why you think this census tract in particular has one.

A low residual is the actual y - expect y. So the lower the residual, the more accurate your expectation was. This means that our regression was pretty accurate. We overestimated in San Jose which resulted in a negative residual which means that our actual y was much lower thaan the expected y. The furthest points North, South, and East were all underestimated The actual y was higher than the estimated y. The data we got from our regression is similar to what the actual data shows and our lowest regression in the positive side was 0.1